75 research outputs found

    Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access

    Full text link
    Pregel is a popular distributed computing model for dealing with large-scale graphs. However, it can be tricky to implement graph algorithms correctly and efficiently in Pregel's vertex-centric model, especially when the algorithm has multiple computation stages, complicated data dependencies, or even communication over dynamic internal data structures. Some domain-specific languages (DSLs) have been proposed to provide more intuitive ways to implement graph algorithms, but due to the lack of support for remote access --- reading or writing attributes of other vertices through references --- they cannot handle the above mentioned dynamic communication, causing a class of Pregel algorithms with fast convergence impossible to implement. To address this problem, we design and implement Palgol, a more declarative and powerful DSL which supports remote access. In particular, programmers can use a more declarative syntax called chain access to naturally specify dynamic communication as if directly reading data on arbitrary remote vertices. By analyzing the logic patterns of chain access, we provide a novel algorithm for compiling Palgol programs to efficient Pregel code. We demonstrate the power of Palgol by using it to implement several practical Pregel algorithms, and the evaluation result shows that the efficiency of Palgol is comparable with that of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape

    Finding 2-Edge and 2-Vertex Strongly Connected Components in Quadratic Time

    Full text link
    We present faster algorithms for computing the 2-edge and 2-vertex strongly connected components of a directed graph, which are straightforward generalizations of strongly connected components. While in undirected graphs the 2-edge and 2-vertex connected components can be found in linear time, in directed graphs only rather simple O(mn)O(m n)-time algorithms were known. We use a hierarchical sparsification technique to obtain algorithms that run in time O(n2)O(n^2). For 2-edge strongly connected components our algorithm gives the first running time improvement in 20 years. Additionally we present an O(m2/logn)O(m^2 / \log{n})-time algorithm for 2-edge strongly connected components, and thus improve over the O(mn)O(m n) running time also when m=O(n)m = O(n). Our approach extends to k-edge and k-vertex strongly connected components for any constant k with a running time of O(n2log2n)O(n^2 \log^2 n) for edges and O(n3)O(n^3) for vertices

    Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

    Get PDF
    Background: The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results: We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n^4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions: The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.Human Evolutionary Biolog

    On vertex adjacencies in the polytope of pyramidal tours with step-backs

    Full text link
    We consider the traveling salesperson problem in a directed graph. The pyramidal tours with step-backs are a special class of Hamiltonian cycles for which the traveling salesperson problem is solved by dynamic programming in polynomial time. The polytope of pyramidal tours with step-backs PSB(n)PSB (n) is defined as the convex hull of the characteristic vectors of all possible pyramidal tours with step-backs in a complete directed graph. The skeleton of PSB(n)PSB (n) is the graph whose vertex set is the vertex set of PSB(n)PSB (n) and the edge set is the set of geometric edges or one-dimensional faces of PSB(n)PSB (n). The main result of the paper is a necessary and sufficient condition for vertex adjacencies in the skeleton of the polytope PSB(n)PSB (n) that can be verified in polynomial time.Comment: in Englis

    Greedy Shortest Common Superstring Approximation in Compact Space

    Get PDF
    Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for solving the problem. We present the first time and space efficient implementation of the classic greedy heuristic which merges strings in decreasing order of overlap length. Our implementation works in O(n log σ) time and bits of space, where n is the total length of the input strings in characters, and σσ is the size of the alphabet. After index construction, a practical implementation of our algorithm uses roughly 5n log σ bits of space and reasonable time for a real dataset that consists of DNA fragments.Peer reviewe

    An integrated approach to a combinatorial optimisation problem

    Get PDF
    Funding: MRC grant MR/S003819/1 and Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England) and the devolved administrations, and leading medical research charities.We take inspiration from a problem from the healthcare domain, where patients with several chronic conditions follow different guidelines designed for the individual conditions, and where the aim is to find the best treatment plan for a patient that avoids adverse drug reactions, respects patient’s preferences and prioritises drug efficacy. Each chronic condition guideline can be abstractly described by a directed graph, where each node indicates a treatment step (e.g., a choice in medications or resources) and has a certain duration. The search for the best treatment path is seen as a combinatorial optimisation problem and we show how to select a path across the graphs constrained by a notion of resource compatibility. This notion takes into account interactions between any finite number of resources, and makes it possible to express non-monotonic interactions. Our formalisation also introduces a discrete temporal metric, so as to consider only simultaneous nodes in the optimisation process. We express the formal problem as an SMT problem and provide a correctness proof of the SMT code by exploiting the interplay between SMT solvers and the proof assistant Isabelle/HOL. The problem we consider combines aspects of optimal graph execution and resource allocation, showing how an SMT solver can be an alternative to other approaches which are well-researched in the corresponding domains.Postprin

    On the Complexity of Scheduling in Wireless Networks

    Get PDF
    We consider the problem of throughput-optimal scheduling in wireless networks subject to interference constraints. We model the interference using a family of K-hop interference models, under which no two links within a K-hop distance can successfully transmit at the same time. For a given K, we can obtain a throughput-optimal scheduling policy by solving the well-known maximum weighted matching problem. We show that for K > 1, the resulting problems are NP-Hard that cannot be approximated within a factor that grows polynomially with the number of nodes. Interestingly, for geometric unit-disk graphs that can be used to describe a wide range of wireless networks, the problems admit polynomial time approximation schemes within a factor arbitrarily close to 1. In these network settings, we also show that a simple greedy algorithm can provide a 49-approximation, and the maximal matching scheduling policy, which can be easily implemented in a distributed fashion, achieves a guaranteed fraction of the capacity region for "all K." The geometric constraints are crucial to obtain these throughput guarantees. These results are encouraging as they suggest that one can develop low-complexity distributed algorithms to achieve near-optimal throughput for a wide range of wireless networksopen1

    Calculating Ensemble Averaged Descriptions of Protein Rigidity without Sampling

    Get PDF
    Previous works have demonstrated that protein rigidity is related to thermodynamic stability, especially under conditions that favor formation of native structure. Mechanical network rigidity properties of a single conformation are efficiently calculated using the integer body-bar Pebble Game (PG) algorithm. However, thermodynamic properties require averaging over many samples from the ensemble of accessible conformations to accurately account for fluctuations in network topology. We have developed a mean field Virtual Pebble Game (VPG) that represents the ensemble of networks by a single effective network. That is, all possible number of distance constraints (or bars) that can form between a pair of rigid bodies is replaced by the average number. The resulting effective network is viewed as having weighted edges, where the weight of an edge quantifies its capacity to absorb degrees of freedom. The VPG is interpreted as a flow problem on this effective network, which eliminates the need to sample. Across a nonredundant dataset of 272 protein structures, we apply the VPG to proteins for the first time. Our results show numerically and visually that the rigidity characterizations of the VPG accurately reflect the ensemble averaged properties. This result positions the VPG as an efficient alternative to understand the mechanical role that chemical interactions play in maintaining protein stability

    Assessing the functional coherence of modules found in multiple-evidence networks from Arabidopsis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems.</p> <p>Results</p> <p>We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in <it>Arabidopsis thaliana</it>. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters.</p> <p>Conclusions</p> <p>Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.</p
    corecore